Mixing, Ergodic, and Nonergodic Processes with 
Rapidly Growing Information between Blocks 



Lukasz D^bowski* 



Abstract 

We construct mixing processes over an infinite alphabet and ergodic 
processes over a finite alphabet for which Shannon mutual information 
between adjacent blocks of length n grows as , where P € (0, 1). The 
processes are a modification of nonergodic Santa Fe processes, which were 
introduced in the context of natural language modeling. The rates of mu- 
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this paper. As an auxiliary result, it is shown that infinite direct products 
of mixing processes are also mixing. 
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I Introduction 



Let H{X) := E[— logP(X)] denote the entropy of a discrete variable X on 
a probability space (fi, J', P), where E is the expectation with respect to P, log 
is the binary logarithm, and the variable P{X) takes the value P{X = x) for 
X = X. We have the mutual information I{X- Y) H{X) + H{Y) - H{X, Y) 
for finite entropies on the right hand side. Besides, we have the conditional 
entropy H{X\Z) = H{X,Z) — H{Z) and the conditional mutual information 
I{X-Y\Z) H{X\Z)+H[Y\Z)-H{X_J'\Z). These definitions are generalized 
to arbitrary random variables, e.g., in [f|, y]- 

Let {Xi)i(zi be a stationary process on [n,J,P), where Xi : {^,J) — >■ 
(X, A"). For its distribution fi = P{{Xi)i^z G •) we denote the mutual informa- 
tion between blocks of length n as 

E^{n):=I{Xi.,n;Xn+l:2n)- (1) 

The limiting value of mutual information, called excess entropy, is defined as 
:= /((X,).<o; {X,),>i) = lim E^{n) (2) 

n— f oo 

These quantities are natural measures of dependence for discrete-valued pro- 
cesses We are interested in constructing diverse examples of stationary 
measures for which 

E^in) X (3) 

where (3 € (0, 1), because certain measures of this kind may be useful for mod- 
eling natural language, cf., 0,0. 

Mentioning related results, let us first consider Gaussian processes. For 
theses processes the conditional mutual information equals I{Xo; XnliXi)"^^^) — 
— log(l — |Q!(n)p), where function a{k) is the partial autocorrelation, cf., Q. 
Regardless of the alphabet, the mutual information between blocks may be 
reconstructed from conditional mutual information as 

n-l 2n-l 

E^{n) - ^A;/(Xo;Xfe|(X,)ti')+ E (2n - A;)/(Xo; (4) 

k—l k—n 

Thus the asymptotics dS]) holds if and only if X]fe=i ^ ^ a result, 

the construction of processes that satisfy condition ^ is easy because the sole 
constraint on partial correlation reads |q;(A:)| < 1 7]. However, a classical 
result Q says that excess entropy of nonsingular Gaussian autoregressive moving 
average (ARM A) processes is finite, cf., [3, d. Theorem 9.4.1], O, Section 5.5]. 

Some examples of stationary processes for which excess entropy is infinite are 
also known for discrete-valued processes. The trivial example for a countably 
infinite alphabet is a process such that Xi does not depend on i and H{Xi) = oo. 
Then we have E^(n) — oo for any n > 1. The aforementioned construction is 
impossible for processes over a finite alphabet. Considering those processes, 
we mention first that asymptotics E{n) = (fc/2) log(n/27re) -I- 0(1) holds for 
any Bayesian mixture of a fc-parameter model with a prior concentrated on 
a subset of parameters with bounded Fisher information fn'. Theorem 8.3]. 
Similar asymptotics E{n) x logn holds for a binary process constructed by 
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Gramss [12|, cf., [13|, The distribution of that process is formed by the 
frequencies of O's and I's in the rabbit sequence. As for processes with infinite 
excess entropy that are mixing, Bradley [l5| constructed a binary process which 
satisfies two conditions, cf., [16|: (i) the process is p-mixing and (ii) the restricted 
measure P((Xi)i<ovi>ri S •) is singular with respect to the product measure 
PiiXi)^<o e •) X PiiX^)^>„ G •) for any n > 1 [H, Lemma 3]. The first 
property implies that the process is mixing in the ordinary ergodic theoretic 
sense [17|, Volume 1, Chapters 3 and 5]. The second property implies that the 
excess entropy is infinite. 

A few other examples concern hidden Markov chains. By the data process- 
ing inequality, excess entropy is finite for hidden Markov chains with a finite 
number of hidden states [18|. On the other hand, if the distribution of ergodic 
components of a stationary process has infinite entropy then the process has 
infinite excess entropy 0, Theorem 5]. Such a situation may arise for hidden 
Markov chains with a countably infinite number of hidden states. (Consider 
for instance a mixture of periodic processes where the probability of a period 
is a sufficiently slowly decreasing function of the cycle length [l^.) A less triv- 
ial example, constructed in [l9| . is a stationary ergodic hidden Markov chain 
with infinite excess entropy, a finite number of output symbols, and a countably 
infinite alphabet of hidden states. 

In this paper we will consider another class of processes that are nonergodic, 
ergodic, or mixing and satisfy condition ([3]). The construction of these processes 
is motivated linguistically. Let us first sketch this motivation. In our previous 
work we have shown that proportionality ([3]) implies a power law which 
resembles Zipf's law for the distribution of words. Namely, product Efj^(n) logn 
is upper bounded by the expected vocabulary size of an admissibly minimal 
grammar for the text of length n. It was empirically observed that the latter 
quantity approximates the number of distinct words for texts in natural language 
[20|. Our bound for mutual information and the vocabulary size holds if the 
alphabet X is finite and the process's distribution has finite energy property 0, 
Theorem 3]. There is also another linguistically motivated bound for £'p(n). 
That one is a lower bound. Namely, asymptotics 

limsupi?^(n)/n^ > (5) 

n— ^oo 

follows from a hypothesis that texts describe an infinite random object in 
a highly repetitive way so that independent facts about the object can be 
inferred on average from the text of length n Theorem 2] . 

The goal of this paper is to prove the stronger asymptotics Q for processes 
that were discussed in [5| and to define a new model of texts that describe a ran- 
dom object. So far, we have considered objects that do not change in time. This 
leads to models of texts being nonergodic measures. Here, we will admit objects 
that evolve slowly. That leads to models of texts which are mixing measures and 
still satisfy proportionality Q. In this way, linguistic inspiration contributes to 
better understanding of yet another problem in information theory. 

Let us introduce our basic example. Throughout this paper, (Xi)igz denotes 
a stationary process on {n, J, P) with : (fJ, J) (X, A") and X = N x {0, 1}, 
where N is the set of positive integers. In the series of papers 0, [2l|, 0] we have 
examined some properties of the following process {Xi)i^z, called the (original) 
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Santa Fe process in Namely, the variables Xi consist of pairs 

X, = {K„Zk,), (6) 

where processes {Ki)i^z and {Zk)keN are independent and distributed as follows. 
First, variables Zk are binary and equidistributed, 

P{Zk = 0) = P{Zk = 1) = 1/2, {Zk)keN - IID. (7) 

Second, variables Ki obey the power law 

P(K, = fc) = k-^^l'/ar^), (i^O.ez ~ IID, (8) 

where /3 G (0, 1) and (^{x) — ^"^ ^^^^ function. 

Let us recall that /i = P((Xj)jgz e •) and ^^^(n) = / (Xi:„; X„+i:2„)- The 
first new result of this paper is: 

Proposition 1 The block mutual information Ef^{n) for the original Santa Fe 
process {Xi)i(zj, given by formula ^ obeys 

The calculation of the limit is facilitated by a decomposition of mutual informa- 
tion between blocks Xi^n and Xn+i:2n into a series of triple information among 
blocks Xi;n and Xn+i:2n and variables Z^. This decomposition is a particular 
property of the Santa Fe process and some similar measures. 

The uncommon construction of process ^ can be interpreted in this way. 
Imagine that the Santa Fe process is a sequence of statements which describe 
a random object (Zfe)fegN consistently. Each statement Xi = (fc, z) reveals both 
the address fc of a random bit of (Zfe)fcgN and its value Z^ — z. Observe that 
the description is repetitive and consistent: if two statements Xi — (fc, z) and 
Xj = {k\ z') describe bits of the same address (fc — k') then they always assert 
the same bit value [z — z'). It follows hence that variables Zk can be predicted 
from realization [Xi)i^x in a shift-invariant way and therefore the Santa Fe 
process is (strongly) nonergodic, cf., 0, 0, Definition 1]. 

Now let us introduce an example of a mixing process which satisfies ([3]). For 
this goal, we will replace individual variables Z^ in the Santa Fe process with 
Markov chains {Zik)i£z- These Markov chains will be obtained by iterating 
a binary symmetric channel. Subsequently, the following process (Xi)i^z will 
be called the generalized Santa Fe process. Let us put 

X, = {K,,Z,^K,), (10) 

where processes {Ki)i^z and {Zik)i^z, where fc G N, are independent and dis- 
tributed as follows. First, variables Ki are distributed according to formula 
([5]), as before. Second, each process {Zik)i^z, is a Markov chain with marginal 
distribution 

P{Z,k = 0) = P{Z,k = 1) = 1/2 (11) 
and cross-over probabilities 

P{Z,k = 0|Z,_i,fe = 1) = P{Z,k = l|^^-l,fc = 0) = pk. (12) 
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A linguistic interpretation of this process is as follows. Facts that are men- 
tioned in texts repeatedly fall roughly under two types, as mentioned in the 
discussion of Definition 1 in (i) facts about objects that do not change in 
time (like mathematical or physical constants), and (ii) facts about objects that 
evolve with a varied speed (like culture, language, or geography). The random 
object {Zk)keTi described by the original Santa Fe process does not evolve, or 
rather, no bit Zk is ever forgotten once revealed. On the other hand, the ob- 
ject {Zik)kef'i described by the generalized Santa Fe process is a function of an 
instant i and the probability that the fc-th bit flips at a given instant equals pk- 
For vanishing cross-over probabilities, the generalized Santa Fe process collapses 
to the original process. 

As we will establish later in this paper, the generalized Santa Fe process is 
mixing for cross-over probabilities different to or 1. 

Proposition 2 The generalized Santa Fe process {Xi)i^z given by formula UU\) 
is mixing for pk G (0, 1). 

The proof consists in noticing that infinite direct products of mixing processes 
are mixing. This is an easy generalization of the well known fact for finite 
products [22, Chapter 10. §1]. 

We will also demonstrate this fact, which generalizes Proposition [T] 

Proposition 3 The block mutual information Ef^{n) for the generalized Santa 
Fe process {Xi)i,=z given by formula mU\) obeys 

E,,{n) ^ (2 - 2^)r(l ~ /3) 

The lower limits in particular cases are as follows: 
(i) IfPk < P(K^^ k) then 



A(/3):= sup il-^[5)f f SL^^l^ (15) 



where 

5e(i/2,i) w(-lnu)'^+-^ 
and r\(S) is the entropy of binary distribution {S, 1 — S), 
r]{6) := -5\og6-{l-S) log(l - ^). 



(ii) If\\vak^ooPk/P{Ki = fc) = then E^{n) obeys (Qj. 

Now let us introduce a similar ergodic process over a finite alphabet. For 
this goal we use a transformation of processes over an infinite alphabet into 
processes over a finite alphabet that preserves stationarity and (non)ergodicity 
and does not distort entropy too much, as we have shown in [2l[. We call 
this transformation stationary (variable-length) coding. (The same or a similar 
construction has been considered in [23l uA, [25|.) It is a composition of two 
operations. 
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First, let a function / : X — > Y*, called a coding function, map symbols from 
alphabet X into strings over another alphabet Y. We define its extension to 
double infinite sequences : X^ -J> Y^ U (Y* x Y* ) as 

f{{x^)^ez) ...f{x-i)f{xo).f{xi)f{x,)..., (16) 

where G X and the bold-face dot separates the 0-th and the first symbol. 
Then for a stationary process {Xi)i^z on {Q,,J,P), where variables Xi take 
values in space (X, A"), wc introduce process 

(KO.ez := (17) 

where variables Yi take values in space (Y,3^), as long as the right hand side is 
a double infinite sequence almost surely. 

The second operation is as follows. Transformation (fT7|) does not preserve 
stationarity in general but process (l^i)igz is asymptotically mean stationary 
(AMS) under mild conditions [U, Proposition 2.3], which are satisfied in the 
setting considered further. Then for the distribution 

P{{yi)^e^ ^■) = v (18) 
and the shift operation T{{yi)iQz) '■= {yi+i)iei, there exists a stationary measure 

n— 1 

D{A):^ lim -Vi/oT-XA), (19) 

1=0 

called the stationary mean of v 2^ 21 1 . It is convenient to suppose that prob- 
ability space {n,J',P) is rich enough to support a process {Yi)iez with the 
distribution 



P((y,)»ez e-)^D. (20) 

Whereas process {Yi)i^z need not be stationary, process (Yi)iQj, is stationary 
and will be called the stationary (variable-length) coding of {Xi)i^z- 

Processes {Xi)i^z, (^i)iez, and {Yi)i^^ have isomorphic shift-invariant alge- 
bras for some nice coding functions, called synchronizable injections [21i, Propo- 
sition 3.3]. For example, for the infinite alphabet X = N x {0, 1}, let us assume 
the ternary alphabet Y = {0, 1, 2} and the coding function 

f{k,z) = b{k)z2, (21) 

where b{k) e {0, 1}^ is the binary representation of a natural number k stripped 
of the leading digit 1. Coding function ((2T|l is an instance of a synchronizable 
injection. Hence we have the following fact: 

Proposition 4 Let (Yi)i^z be the stationary coding obtained from applying the 
coding function \21]) to the generalized Santa Fe process Process {Yi)i(zi 

is nonergodic if Pk — and ergodic if Pk G (0, 1). 

Notice, however, that the stationary coding of a mixing process is not mixing 
for a synchronizable coding function in general. For example, if we take the 
generalized Santa Fe process and the coding function /(fc, z) = 01, which is also 
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a synchronizable injection, the stationary coding (Fi)igz is not mixing because 
of periodic oscillations in the realizations of the process (li)igz. Such regular 
periods do not arise for the generalized Santa Fe process and the coding function 
(PT|) since variables \f{Xi)\, where |ix;| is the length of string w, differ from 
constants and are independent and identically distributed. Thus, we conjecture 
that the resulted process (l^)igz is mixing for pk e (0, 1). 

Now let us consider block mutual information for the stationary coding of 
the generalized Santa Fe process. Let us recall that D = P{{Yi)i^z G ■) and 
Ei}{m) = I (Yi-jyii Ym+i:2m) ■ As the last new result, we will show this fact: 

Proposition 5 Let {Yi)ii=z be the stationary coding obtained from applying the 
coding function to the generalized Santa Fe process liO)] . Define the ex- 
pansion rate L := 'E\f{Xi)\. The block mutual information Ep{m) for process 
{Yi)i^i satisfies 

Euim) 1 (2 - 2^*)r(l - /3) 

The lower limits in particular cases are as follows: 
(i) IfPk < P{K^ = k) then 

liminf^>i^^^^) . (23) 



where A{f5) is defined in hl5]] . 
(a) If\imk^ooPk/PiKi = fc) = then 

hm ^ = ^^'-'P'^L^^- (24) 
m^oo to/5 Lfi [C{13-^W 

Proposition [5] follows from Proposition [3] by the conditional data processing 
inequality and Chcrnoff bounds. This proposition strengthens inequality 

limsup^^^ >0, (25) 

-m — i. III/ 



which follows for pk = by [2l|, Proposition 1.4] and [5|, Theorem 2]. 

The further organization of this paper is as follows. The rate of mutual infor- 
mation for the original and generalized Santa Fe processes is discussed in Section 
ITTl The rate of mutual information for the stationary coding is established in 
Section IIIII Subsequently, the mixing property for the generalized Santa Fe 
process is shown in Appendix]^ As an auxiliary result, we demonstrate that 
infinite direct products of mixing processes are also mixing. 



II The rate of mutual information 

In this section we evaluate the rate of block mutual information for the Santa 
Fe process and its mixing counterpart. The main tool is conditional mutual 
information for stochastic processes as discussed, e.g., in [H, Q. 

Here are some facts about conditional information that will be used, cf., 0: 
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(a) continuity J(X; {Yk)keN\Z) = lim„^oo I{X: {Yk)'^^^\Z), 

(b) chain rule I{X; Y, Z\W) ^ I{X; Y\W) + I{X; Z\Y, W), and 

(c) equality I{X; Y\Z) = ior X and Y conditionally independent given Z. 
Two simple corollaries of the chain rule will be used as well: 

(i) IiX;Y) = I{X;Y;Z)+I{X;Y\Z) for H{X),H{Y) < oo, where we define 
triple information 

I{X- Y; Z) I{X- Z) + I{Y- Z) - I{{X, Y)-Z), 

(ii) I{X;Z\Y) = I{X;Z) for X and Y independent and conditionally inde- 
pendent given Z . 

The second identity follows from I{X;{Y,Z)) = I{X;Y) + I{X;Z\Y) = 
I{X; Z) + /(X; Y\Z) where both 1{X; F) = and /(X; Y\Z) = 0. 

Now we can evaluate block mutual information i?^ (n) for the Santa Fe pro- 
cesses. The case of the original Santa Fe process is simpler and will be consid- 
ered separately to guide the reader through the more complicated proof for the 
generalized process. 

Proof of Proposition [1} Notice that variables Z/j, fc e N, are independent 
and conditionally independent given any finite block Xn-.m- Hence 

oo oo 
i {X\;n; (Zfe)fcgN) = ^ l{X\;n, Zk\Zi;k^l) = ^ /(Xi:„; Zk). 

k=l k=l 

Also Xi;n and X„-|_i:2n are conditionally independent given (Zfe)fcgN. Hence 
/ (Xi.,n] X.n+i:2n\{.Zk)ken) = 0. Both results yield 

E^{n) = I {Xi;n', Xn+l:2n) 

= I {Xi;n', Xn+i-2n', {Zk)ken) + I {Xi-n, Xn+l:2n\{Zk)ken) 
= I [Xi-n] Xn+i-2n'; {Zk)ken) 

= 2/ (Xi:„; {Zk)ken) — I {Xi;2n] {Zk)ken) 

oo 

= ^[2/(Xi:„; Zk) - I{Xi.,2n; Zk)] 
k=l 
oo 

= ''^Ii-^l-n',Xn+l:2n',Zk)- (26) 
k=l 

Computing simple expressions 

H{Zk\Xi.,n) = 1 • P{K^ ^ k for all i e {1, ...,n}) 

+ • P{Ki = k for some i E {1, n}), 
I{Xi.,ri;Zk) = P{K, = k for some i e 
= (1 - [1 - P{K, = fc)]"), 

we obtain the triple mutual information 

X„+i:2„; Zk) = (1 - [1 - P{,K, = fc)]")2 
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and the block mutual information 

(27) 



k—1 ^ ^ ^ ^ 



where A := l/Cip-^). 

The right-hand side of ([?7|) equals up to an additive constant < 1 to the 
integral 



oo 



where we use substitution 



1 - Ak-^"^\ (28) 



and functions 



/«H «l-l/n[i(l_ii/n)]/3+l- (2^) 



We have the limit 

{l-uf 



lim fn{u) ^ f{u) :^ 



u(-lnw)^'+i 
with the upper bound 

^< sup ^ d*+r = i' ",/3g 0,1. 

Moreover, function /(u) is integrable on u e (0, 1). Hence 

lim ^ PA^ C f{u)du 

n-»-oo nP Jq 

follows by the dominated convergence theorem. 

It remains to compute / f(u)du. Putting t := — Inu yields 

1 poo 

f{u)du^ / {\- e-' ft-f^-^dt 



OO 

\e 



-(2-2^)r'r(i-/3), 

where integral 

■ oo 

{^ke-^'){-p-^)t-f^dt = -kf^p-^T{l - p) 
can be integrated by parts for the considered /3. □ 





oo 
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Next, we prove the more general statement, partly using the preceding proof. 

Proof of Proposition [3) Observe that processes := {Zik)i^z, where fc e N, 
are independent and conditionally independent given any finite block Xn-.m- 
Also Xi-n and Xn+i:2n 8.16 Conditionally independent given {Zk)ken- Thus we 
obtain 

oo 

Efj.{n) — ^ I{Xi;n; Xn+l:2n', Zk) 

by replacing Zk with Zk in derivation (I26p from the previous proof. 

By the assumed Markov property, process Zk = (Zik)iez is independent 
from Xi.n given {Zik)i<i<n- This yields 

Ki<n 

The expressions on the right-hand side can be analyzed as 

n 

I {Xi;n', {Zik)l<i<n) — ^ I {Xi; Zik\Xi;i^i) 
i=l 

because {Zik)i<i<n is independent from Xi given Zik and Xi-i^i. Moreover, 

I {Xi] Zik\Xl;i^l) ~ H {Zik\Xl;i^l) — H {Zik\Xl;i) . 

To evaluate the conditional entropies, put a„fc := •q{P{Zik = z\Zi^n,k — z)) 
and bk P{Ki — k). Notice that by the Markovity of {Zik)i£z we have 

i-l 

H {Z,k\Xi.,,^i) = ankP{Kj ^ k for i-n<j <i- l)P(i^,-„ = k) 
+ ^{P{Z,k - z))P{K, ^ fc for 1 < j < I - 1) 

Similarly, since aofc = 0, we obtain 

H {Z,k\Xi.,{) = '^nkP{Kj ^ k for i^n<j< i)P(K,^n = k) 

n=0 

+ ri{P{Z,k = z))P(K, ^ fc for 1 < J < 

n=l 

Thus we may reconstruct 

i-l 

I {Xf, z,k\Xi..,-i) = Y - ^fe)""' + [(1 - ^fc)'^' - (1 - ^fe)'] ' 

n=l 
n-1 

/ (Xi.„; (Z,fc)i<,<„) = ^ (n - m)a^kbUl - bk^-^ + [1 - (1 - ^fc)"] , 

m— 1 



9 



and 

n-l 



777 — 1 



777—1 

2n-l 



Z/i— 1 

^ {2n - m)a^kbUl - huT'-^ + [1 - (1 - fofc)'f ■ 

m— n 

For a fixed we see that / {^Xi-n, Xn+i-2n] Zkj is minimized for a,„fc = 1. 
This case arises when = 1/2 and {Zik)i£z are IID. A direct evaluation yields 
then H {Zik\Xi.,i^i) = 1, H {Zik\Xi.,i) = (1 - bk), / (^i:™; (^ifc)i<i<ri) = nbk, 
and / ^Xi-n', Xn+i:2n', Zkj = 0. In this way we have proved that 

n-l 2n-l 

mbUl - 6,.)™"' + E - - ^^•)""' = [!-(!- &fe)'f • (30) 

m—l rn—n 

On the other hand, / (^Xi-^', Xn+i:2n', Zkj is maximized for amk = 0. This holds 

if Pfe = or pk = 1. For pk = 0, the process {Xi)i^z collapses to 
By equality (PD|) . we obtain 

/ (Xi.„; X„+i.2„; ^fc) e [(1 - e) [1 - (1 - bkTf , [1 - (1 - fofc)'f 
if o-rnk < e for m < 2n — 1. To bound coefficients Omfc, observe 

F(Z,fe = z|Z,_„,fc = z)>(l-pfe)". 
Hence Omfe < for m < 2n — 1 if (1 — Pk)^" > S >l/2. Thus we obtain 



(1 - 77(5)) Yl [1 - (1 - ' E [1 - (1 - ^'^■)"]' 

fceN:(i-pi.)2">5 fceN 



(31) 



The most tedious part of the proof is completed. 

The limiting behavior of the upper bound in (1311) has been analyzed in 
the proof of Proposition [1] and by that reasoning holds. Now we will 
consider the limit of the lower bound in pip . As in the previous proof, we will 
approximate the respective sum with an integral. Recall that — Ak~^/^ with 
A — l/C(/3^^). Let us define bk for real k in the same way. 

(i) For Pk < bk-. notice that (1 - 6^)2" > 6 implies (1 - Pfc)^" > 6. Thus 
-^m('^)/(1 ^ v{^)) + 1 is greater than 



/ 



(1 - (1 - bkff dk = fiiAnf f fn{u)du, 



(l-bfc)">V^ 

where we use substitution ([^5]) and functions . This yields (fH)) by the 
dominated convergence theorem. 
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(ii) For MiJik Pk/bk = 0: let k{n) be the largest number k such that (1— pfe)^" < 
d or put k{n) = 1 if there is no such number. Then E^{n)/{1 — ri{S)) + 1 
is greater than 

/>oo /> 1 

/ (1 - (1 - b^rf dk = l3{Anf / fn{u)du, 

J k{n) J u(n) 

where u{n) := (l — We have lim„ u{n) = if lim„ k(n) < oo. On 

the other hand, if lim„ fc(n) — oo then we use liminf„ npfe^n-) > — Inv^ 
and \imk Pk/bk = to infer lim„ n&fc(„) = c» and hence lim„ = 0. 
Thus the dominated convergence theorem in both cases yields 

liminf > (1 - r](6))l3A^ f f(u)du. 

n-J-oo nP Jq 

Taking (5-^1 gives ©. 

□ 



III Encoding into a finite alphabet 

In this section we study the rate of mutual information for the stationary coding 
of the generalized Santa Fe process. Let \w\ be the length of string w and let 
{Xi)i^2 denote the generahzed Santa Fe process. For the coding function (PT|) . 
regardless of the value of pk, the expansion rate 

n 

hm (32) 

i=l 

is almost surely constant and equals the expansion rate L :— E \f{Xi)\. Hence 
the stationary coding {Yi)i^z can be constructed as detailed below. This con- 
struction was formally introduced in 2l|, Section 6] and justified by [21, Propo- 
sition 2.3]. 

Suppose that probability space (51, J, P) is sufficiently rich to support some 
previously unmentioned random variable : 51 — > N U {0}, called a random 
shift, and a nonstationary process {Xi)i^z where : 57 — >■ X. We assume that 
N and {Xi)i^z are conditionally independent given Xq and their distribution is 

P{Xk:l=Xk:l) = PiXk:l^Xk:l)-^-^^, k<0<l, (33) 

P(iV^„|Xo = .To)-ii^^^fgM^, neNU{0}. (34) 

\f[xo)\ 

Process {Yi)i^z with the desired distribution P = P{{Yi)i^z € •), where i' = 
P{{Yi)i(zz e •) for (yj)jgz = /^((Xj)jgz), can be obtained as 

(F.).ez = T-^/^((X,).ez), (35) 
where T{{yi)i^z) (yi+i)iez is the shift operation. 

Lemma 1 Denote blocks Xku with Xq removed as X^.j^q. For the Santa Fe 
processes variables X/^.i^q and X/^.i^q have the same distribution. 
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Proof: Notice that |/(Xo)| does not depend on Zq^Kq and Kq is independent 
of ^fe:(\o- Hence 

P{Xk.,\o) = P{Xk..i\o) E ^-^^P{Xo = xo\Xk.,\o) 

Xo 

= P{Xku\o) E ^^^^^P(^o = ko)P{Zo,ko = zo\Xu\o) 

ko,zo 

= P{Xk:i\o)- (36) 

□ 

In the following we write Li := \f{Xi)\ and Li := \f{Xi)\. Variables Li are 
independent and identically distributed. For these variables we define indices 

i+:=^logE2*^% (37) 
L,- :=-ilogE2-*^S (38) 

where t > 0. For the given distribution of Li, we have < L'[,Lf < oo for 
sufficiently small t. By the Jensen inequality L^ is a growing function of t and 
is a decreasing function of t. Jensen inequality implies also < L < Lf. 



Lemma 2 We have 



lim LJ = lim L7 = L. (39) 



Proof: Consider function x) = i~^(2*^ — 1 — tx). For a; > 0, it is a growing 
function of t. Consider next such a to that £2*"^' < cxd. For < t < to, we 
obtain 

E2*^' = l + tE,Li + t'^'Eg{t,L,) < 1 + i E Lj + Eg(io, ii). 
This yields 

L<L+< hog [l + t-E Li + t''F,g{to, Li)) ^ L. 
On the other hand, for < > 0, we have 



E2-*^' < 1 - tELi + 

2 

Hence 

i > L,- > -1 log (l - t EL, + L. 

□ 
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Define events 

5+:= (f^L. <n(L+ + e)), (40) 



Sn ■■= [J2L,>n{Li -e)j, (41) 
T+:= ( L,<(n-l)(L+ + e)), (42) 

T-:=( ^ L,>(n-l)(L,--e)). (43) 

\i=-n+l / 

Subsequently, we will use the Chernoff bounds: 
Lemma 3 For t > and e > 0, 

P{Sr)<^,, (44) 
P(5-^)<^, (45) 

P{Tnl<^^,- (47) 

Proof: Because variables are independent and identically distributed, using 
Markov inequality we observe 

p (s+') = p (^2* i:. > 2'^(Lt+^)) < ^^l^Ll < J_, 

Analogously wc obtain the claims for T^*^ and T~'^ . □ 

Next, for an event E, we introduce conditional entropy H{X\E) and mutual 
information I{X;Y\E) which are respectively the entropy of variable X and 
mutual information between variables X and Y taken with respect to probability 

measure P{-\E). 

Lemma 4 For the generalized Santa Fe process, let 7 := — P) and s < 
min(t/2,7/2). Then for sufficiently large n, 

P{Sr)H{L,\sr)<^^, (48) 

P [Tr) H {L.,\Tr) < (49) 



Proof: We have 

P{L. = 1)- Y. < / k-"'<^^ = 72-^^'-^^ 
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Write i{p) := —plogp. Then for sufficiently large 

oo 

oo 

< 5: z(72-'^('-^)) 



l=N 



< ^722t2-t' (-log7 + 70 



< A{'y)N2-'^^. 

Let M = n{Lf + e) and A'' = \ne/2] — 1. Then, for sufficiently large n, 



i=0 \ \j=2 // l=N 

< Ni (p [E^i > M-iV + 1 ) ) + E^l^l-f^i =0) 



^ n(n - l)te2 nA{j)e ^ 1 

Analogously wo obtain the claim for T+^. □ 
Now, define events 



^n--= \J2Li<n{Lt + e)j, 
r+:= ( E Li<{n-l){Lt+e) 



\i——n-\-l 
/ -1 



E Zi>(n-l)(L,--e) , 



\i— — n+1 

B := (lo < . 
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Lemma 5 For m > n{L'l + e) + ^ we have 

</(y_™+i^o;yi:m|Bn^+nT+), (55) 

whereas for m < (n — ^ ^) /laue 

l{Y^m+l:0;Yl:m\S-nT~) < / (X_„+l:0 ; Xo:„ | 5" H f " ) . (56) 

Proof: The claims follow by equality ((35)l and conditional data processing 
inequality 

IiU'-r\C)<I{U;V\C), 

which holds if equalities U' — g{U) and V' — h{V) are satisfied on C. □ 

There is an additional fact that we shall use. Let Ic be the indicator function 
of event C. Observe that 

P{C)I{X- Y\C) < P{C)I{X; Y\C) + P{C')I{X; Y\C') 

= I{X-Y\Ic) = I{X-Y) - I{X-Y-Ic), (57) 

where \I{X;Y] Ic)\ < H{Ic) < 1 by the information diagram [2^ . 

Proof of Proposition [Sj Observe that 

H {Xi\S+') = H {Xi\Li) + H {Li\S+') 

< H (Xo) + H {Li\S+'') , (58) 
H (X_i|r+^) = i/(X_i|L_i) + H (L_i|T,n 

< H (Xo) + H {L^i\T+') . (59) 

because Xi is conditionally independent from S^'^ given Li and X^i is condi- 
tionally independent from given L_i. Now, assume that n is sufficiently 
large so that bounds and hold true. For brevity, define events 

:= n S^. 

Then inequalities (ISH]), ([111, (gl]), (SS]), (gS]), and g!]) yield 

< P (St) nH {X,\S+') + P (r+^) {n - l)H (X.i|r+'=) 



15 



Moreover, assume that m > n{L^ + e) + I. Then applying subsequently ([57| . 
23, dSni), dSIl), and (inni) we obtain 



> p (B n c+) / (F_„.+i:o; n c+) - i 

>P {B)P {C+) I (X_„+i._i;Xi.„|C'+) - 1 
= P{B)P {C+) I (X_„+i:_i;Xi.„|C+) - 1 
>P(B)P(C+) [/(X_„+i:o;Xi.„|C+) -i/(Xo|C+)] -1 
>P{B)P {C+) I (X_„+i:o; Xl:„|C+) - i/(Xo) - 1 

> P (B) [E^{n) -l-P {Ct") I (X_„+i:o; Xi:„|G+')] - i/(Xo) - 1 

> P (B) E^{n) - P {C+") I {X^n+i:-i;Xi.,^\C+") - 2HiXo) - 2 
2n 



>P{B)E^{n) 
Next, define events 



2(n-l)te 



HiXo) 



2n 

2("-i)^ 



(61) 



By (gil) and (gT]) we have 



< (P(fi-^)+P(T-^))mlog3< 



2m 



2(ri-l)t, 



■log 3. 



(62) 



Assume that m < [n— ^){L^ — e). Then applying subsequently ([57)) . 
([57)1. and ^ we obtain 

£;^(n) =/(X_„+i^o;^i:«) 

> (-''^-ri+l:-!; -'^l:™) 
= I (-?-n+l:-i; Xi-n) 

> ^ {X-n+l:0\ -^0:ri) ^ 2i7(Xo) 

> P (C-) / (X_„+i.o;^0:„|C'-) - 1 - 277(Xo) 



> P (C--) / (r_,„+i^o;yi:m|C-) - 2i/(Xo) - 1 

> E^{m) -l-P {C-") I (r-™+l:0;yi:m|C-') - 2i/(Xo) - 1 

2m 



2("-i)* 

From bounds (|6ip and ([55)1 we obtain 
1 



log3-2ff(Xo)-2. 



(63) 



[i( — e]^ n-i-oo 
1 



Mn) ^ Epjm) ^ P(Lo < I) 
limsup z — > limsup „ — > 



lim sup 



liminf ^ > liminf ^ > liminf 



i;,7(n 



/3 



/3 



If we consider i 0, e — ^ 0, and I oo then the requested claims will follow 
by equation and Proposition [31 □ 
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A Mixing properties 



In this appendix we will discuss mixing properties of the generalized Santa Fe 
process. The setting makes use of the space of complex valued functions. 
Then, for a measure space (fi, J , /i) let 

Ll{n, J,^l)■.= ^fe L^{n, J fd^i = o 



and denote the inner product (/, (?)^ := / fgd^i and the norm ||/||^ \J (/, /)p 
for f,gG L^{^, J 1 /i). Let also T : fi — )■ £7 be an invertible transformation that 
preserves the measure, fioT^^ — ji. The dynamical system (£7, J , fi, T) is called 
mixing when lim (/ o T\g)^ = for /, g e Ll{n, J, ^). By the way, we know 

n— ^C30 

that any mixing dynamical system is ergodic [22, Chapter l.jj6 ] . 

The following proposition generalizes Theorem 2 from [22l Chapter 10. §1]. 
Whereas the original claim deals with finite direct products of dynamical sys- 
tems, we will extend it here to infinite products. To the best of our knowledge 
this generalization has not been discussed in the literature so far. The proof is 
similar to the finite case, except for using a different orthonormal basis of the 
product space. 

Proposition 6 Let {ilj,Sj,iij,Tj), where j S N, he dynamical systems with 
probability measures = 1- Consider the direct product (fl, , fi,T) , where 

n = -x^Zi^lj, J = (E>JLiJj, M = Xj^iMj; and T{lu) ^ {Tj{ujj))jeN for uj = 
{Ljj)j^fi, LUj G fij. If {Clj,J^j, iij,Tj) are mixing then (fl, , fi,T) is also mixing. 

Proof: Let {eaj.j)ajeAj be orthonormal bases of spaces L^{Vlj,Jj,^j) with 
eoj = 1 and Ca^j € Lf^{Q.j,Cfj, ^.j). Then the set 

{e^iuj) = 1} U 1 e„(w) = n [ (64) 

I ) aGAixA2X...x(Afc\{0}),fc=l,2,... 

with multi-indices a — (ai, a2, a/c) is an orthonormal basis of the space 
J',^), cf., [2^, page 29]. (Orthogonality of set ([64]) is obvious whereas its 
completeness follows from the completeness of the analogical orthonormal sets 
for finite products and the L^-bounded martingale convergence.) Let a, a' 7^ 0. 
We have e^, Ca' G Lq{VL, J, ^) and 

k 

|(e„ o T", e'JJ = n ° | < {i^a^M o T^, | (65) 

j=i 

by Schwarz inequality if a and a have the same length k. Otherwise, (cq, o 
T", e^)^ — 0. Hence lim„_s.oo(eQ o T", e^)^ = holds by the hypothesis. 

Any other functions f,g € L^{^, J , ^) can be represented as series / = 

Eq5^0 fa^a and g = Y^a^^Qa^a, whcrc Y.a^$ l/al^Ea^^B < Assumc 
without loss of generality that ||/||^ = ||g||^ = 1. We wiU show that for every 
e > 0, inequality \{f oT'^\g)^\ < e holds for sufficiently large n. Let F and 
G be finite subsets of multi-indices such that ||/ — /'||^ , Hff — < e/4 for 
certain /' = EaGF /a^" and g' = EaGG^a^^" where f',g' G Ll{n,J,fi) and 
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Il/'ll^ = ll.9'll^ = 1- For sufficiently large n, we have |(/' o T", < 
Then 

|(/ o T", .g),)| < |(/' o T", .9'),)! + |((/ - /') ° T", .9'),)! 

+ |(/' o T", {g g'}),)\ + |((/ - /') o r», (.9 - g')^)\ 

< e/4 + 11/ - /'ll^ + ||g - g'W^ + 11/ - /'ll^ \\g - g'\\^ < e, 
which completes the proof. □ 

Now let us apply this result to the generalized Santa Fe process. A stochastic 
process {Xi)i^z on {il,J',P), where Xi : (fl,J^) (X, A"), is called mixing if 
(X^, A'^,^,T) is mixing for ^ = P{{Xk)kez G ') and T{{xi)i(zz) = {xi+i)i<zz. 

Proof of Proposition [2j Introduce an auxiliary process (Wi)i(zz, where 
Wi = {Ki, {Zik)keN)- Process {Wi)i^i is a direct product of processes (ifi)igz, 
(Zii)iez, {Zi2)iez, which are all mixing for pk G (0,1)- Hence {Wi)i^i is 
mixing by Proposition [51 (In our application, we take /i = P{{Wi)i^z G 
Ml = P{{K^),^z e ■), and ^ife+i = P{{Zik)^ez e •) for k > 1. The 
transformations are T{{wi)tez) = (t«j+i)jez, T'i((fci)iez) = ih+i)iez, and 
Tfe+i((zi)iGz) = {zi+i)iez for fc > 1.) Having established the mixing property 
for {Wi)i^z, we notice that Xi = fiWj) for a measurable hmction /. Hence 
{Xi)i^x is mixing by Theorem 3 from [221, Chapter 10. §1]. □ 
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